KenSwQuAD—A Question Answering Dataset for Swahili Low-resource Language

نویسندگان

چکیده

The need for question-answering (QA) datasets in low-resource languages is the motivation of this research, leading to development Kencorpus Swahili Question Answering Dataset (KenSwQuAD). This dataset annotated from raw story texts Swahili, a language that predominantly spoken eastern Africa and other parts world. Question-answering are important machine comprehension natural tasks such as internet search dialog systems. Machine learning systems training data gold-standard set developed research. research engaged annotators formulate QA pairs collected by project, Kenyan corpus. project 1,445 total 2,585 with at least 5 each, resulting final 7,526 pairs. A quality assurance 12.5% confirmed were all correctly annotated. proof concept on applying task can be usable tasks. KenSwQuAD has also contributed resourcing language.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Resource Analysis for Question Answering

This paper attempts to analyze and bound the utility of various structured and unstructured resources in Question Answering, independent of a specific system or component. We quantify the degree to which gazetteers, web resources, encyclopedia, web documents and web-based query expansion can help Question Answering in general and specific question types in particular. Depending on which resourc...

متن کامل

SQuAD Question Answering Dataset: CS224N Assn 4

We solve the contextual question answering problem, which is an essential part in many automated question-answering datasets. Recently the SQuAD dataset [1] was uploaded and there were several deep learning approaches proposed to solve this. We implement a modified version of one of them, the Dynamic Coattention model as well as simple baseline.

متن کامل

Question Answering on the SQuAD Dataset

We develop a deep learning framework for question answering on the Stanford Question Answering Dataset (SQuAD), blending ideas from existing state-of-theart models to achieve results that surpass the original logistic regression baselines. Using a dynamic coattention encoder and an LSTM decoder, we achieved an F1 score of 55.9% on the hidden SQuAD test set. In this paper, we present the methodo...

متن کامل

Language Independent Passage Retrieval for Question Answering

Passage Retrieval (PR) is typically used as the first step in current Question Answering (QA) systems. Most methods are based on the vector space model allowing the finding of relevant passages for general user needs, but failing on selecting pertinent passages for specific user questions. This paper describes a simple PR method specially suited for the QA task. This method considers the struct...

متن کامل

Investigating Embedded Question Reuse in Question Answering

The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing

سال: 2023

ISSN: ['2375-4699', '2375-4702']

DOI: https://doi.org/10.1145/3578553